Part I - Ford GoBike System Dataset

by Saoban Lateefat

Introduction

This dataset is based on individual rides madein a bike-sharing system covering the greater San FranciscoBay area.

Preliminary Wrangling

What is the structure of your dataset?

The dataset originally has 183412 rows and 16 columns which are 'duration_sec', 'start_time', 'end_time', 'start_station_id', 'start_station_name', 'start_station_latitude', 'start_station_longitude', 'end_station_id', 'end_station_name', 'end_station_latitude', 'end_station_longitude', 'bike_id', 'user_type'user typr shou, 'member_birth_year', 'member_gender', 'bike_share_for_all_trip' but after wrangling the dataset, we have 183215 rows and 28 columns which are duration_sec', 'start_time', 'end_time', 'start_station_id', 'start_station_name', 'start_station_latitude', 'start_station_longitude', 'end_station_id', 'end_station_name', 'end_station_latitude', 'end_station_longitude', 'bike_id', 'user_type', 'member_birth_year', 'member_gender', 'bike_share_for_all_trip', 'start_day', 'start_month', 'start_year', 'start_weekday', 'start_weekday_name', 'end_day', 'end_month', 'end_year', 'end_weekday', 'end_weekday_name', 'duration_min'and 'member_age'.

What is/are the main feature(s) of interest in your dataset?

I did be looking out for insights on how the columns affects the bike system

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

I think weekday name, member age , user type should really help me in my investigation.

Univariate Exploration

I start by looking at the gender

Count per gender

The male gender has rides bikes the most with a count of 138763 and least with the others(i.e gender not identified)

user type per count

We have subscribers of 89.2% and Customers of 10.8% Most people who rides ford gobikes are more of subcribers

Checking for the percentage of people who do share trips and those who do not

Here, we can see we have more of people who do not share rides with a total of 90.5% to people who do share rides which is just 9.5%

Duration of rides per seconds

The duration with the highest rides are those with the short minutes and we can see that most rised occurs for just 6 minutes and also the average minutes for each ride is 12 minutes.

Age range for each rides

Taking further insights by looking at the ages of people who ride Ford GoBikes the most, We can see most of the people who rides bikes are between the age of 22 and 40 which are mostly work age range.

Weekday Per Rides

Here, we can see most rides occurs on weekdays and mostly on Thursdays to weekends, we can say this is due prolly because they are workdays.

Trip per hours

Most trips occurs around 5pm and 8am, and its very obvious this is due to rush hours i.e the time most people do leave work and time most people do leave for work.

Top 10 Stations

Noticed we have similar stations for the top 10 start and end stations

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

The male gender has the most records,Most of people who takes the trips are subscribers, Ages that takes rides the most is age 31 and most trips ocuurs for just 6 mins.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

No there isn't any unusual distribution in the dataset.

Bivariate Exploration

Gender per user type

We have more males who are subscribers to female

Gender per bike share

Most males do not share trips

Trip duration per days

Trips are faster on weekdays to weekends

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

In all, We have more males who are subscribers to female, Most males do not share trips and Trips are faster on weekdays to weekends.

Multivariate Exploration

-- Customers spends more time on rides than the subscribers
-- The male and female gender spends averagely the same time on rides

The male gender spends less time and also have the least of those who share trips

Younger males tends to spends more time on rides to older males

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

We can see Customers spends more time on rides than the subscribers, and also most people whose gender is not identified spends more time to those whose gender is identified, So, we can say the time spent on the rides doesn't really depends on whether the person is a male or a female sine they spends equal minutes on rides.

Were there any interesting or surprising interactions between features?

The male gender are the least of those who share trips

Conclusions

After the wrangling and visualizations, We can infer that:
-- We have more males to female
-- There are more of people who are subscribers to customers
-- Most trips are done within short hours.
-- We have higher trips on weekdys to weekends
-- Most ofthe trip is made around 5pm and 8am and we deduce this is due to rush hours.
-- We have younger people who board bikes to older people